Learning to Recognize Ancillary Information for Automatic Paraphrase Identification
نویسندگان
چکیده
Previous work on Automatic Paraphrase Identification (PI) is mainly based on modeling text similarity between two sentences. In contrast, we study methods for automatically detecting whether a text fragment only appearing in a sentence of the evaluated sentence pair is important or ancillary information with respect to the paraphrase identification task. Engineering features for this new task is rather difficult, thus, we approach the problem by representing text with syntactic structures and applying tree kernels on them. The results show that the accuracy of our automatic Ancillary Text Classifier (ATC) is promising, i.e., 68.6%, and its output can be used to improve the state of the art in PI.
منابع مشابه
Paraphrase Identification by Text Canonicalization
This paper proposes an approach to sentencelevel paraphrase identification by text canonicalization. The source sentence pairs are first converted into surface text that approximates canonical forms. A decision tree learning module which employs simple lexical matching features then takes the output canonicalized texts as its input for a supervised learning process. Experiments on the Microsoft...
متن کاملUnsupervised Learning of Paraphrases
Paraphrasing constitutes a corner stone in many Natural Language Processing fields like monolingual text-to-text generation and automatic text summarization. Indeed, aligned monolingual corpora are likely to boost the learning process of text-to-text generation models. A Paraphrase learning strategy can be defined as a two-step process: (1) identifying and extracting related sentence pairs from...
متن کاملKEC@DPIL-FIRE2016: Detection of Paraphrases in Indian Languages (Tamil)
This paper presents a report on Detecting Paraphrases in Indian Languages (DPIL), in particular the Tamil language, by the team NLP@KEC of Kongu Engineering College. Automatic paraphrase detection is an intellectual task which has immense applications like plagiarism detection, new event detection, etc. Paraphrase is defined as the expression of a given fact in more than one way by means of dif...
متن کاملParaphrase Identification on the Basis of Supervised Machine Learning Techniques
This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexica...
متن کاملParaphrase Identification using Machine Learning Techniques
Paraphrases are different ways of expressing the same content. Two sentences are said to be paraphrases if they are semantically equivalent. Identification of paraphrases has numerous applications such as Information Extraction, Question Answering, etc. The traditional systems use threshold values to decide whether two sentences are paraphrases. This threshold determination process is independe...
متن کامل